tags:
- SQLLecture 1 - Relating (NC)
本节课,我们会接着上节课的内容,你将进一步了解到什么叫关系型数据库。后面,我们还会做一些对第一节 Querying 的延申,本节,我们会学习 nested querying 。
.tables
This command returns the names of the tables in longlist.db — 7 in all.longlist.db and try to imagine relationships between them. Some examples are:
authors and books tables with the author name and book title columns!
authors table would give us no information about the books written by that author.authors table will always correspond to the first row in the books table. The problem with this system is that one may make a mistake (add a book but forget to add its corresponding author, or vice versa). Also, an author may have written more than one book or a book may be co-written by multiple authors.



longlist.db.
AuthorBookPublisherTranslatorRatingwrotepublishedtranslatedhas



If we have some database, how do we know the relationships among the entities stored inside of it?
Once we know that a relationship exists between certain entities, how do we implement that in our database?

Notice how the primary key of the books table is now a column in the ratings table. This helps form a one-to-many relationship between the two tables — a book with a title (found in the books table) can have multiple ratings (found in the ratings table).books and ratings entities. Here’s an example of a many-to-many relationship.
There is now a table called authored that maps the primary key of books (book_id) to the primary key of authors (author_id).
Can the IDs of the author and the book be the same? For example, if
author_idis 1 andbook_idis also 1 in theauthoredtable, will there be a mix-up?
authored are called “joint” or “junction” tables. In such tables, we usually know which primary key is referenced by which column. In this case, since we know that the first column contains the primary key of authors only and the second column similarly contains the primary key of books only, it would be okay even if the values matched!If we have a lot of joint tables like this, wouldn’t that take up too much space?
On changing the ID of a book or author, does the ID get updated in the other tables as well?
books table, we have an ID to indicate the publisher, which is a foreign key taken from the publishers table. To find out the books published by Fitzcarraldo Editions, we would need two queries — one to find out the publisher_id of Fitzcarraldo Editions from the publishers table and the second, to use this publisher_id to find all the books published by Fitzcarraldo Editions. These two queries can be combined into one using the idea of a subquery.
SELECT "title"
FROM "books"
WHERE "publisher_id" = (
SELECT "id"
FROM "publishers"
WHERE "publisher" = 'Fitzcarraldo Editions'
);
Notice that:
SELECT "rating"
FROM "ratings"
WHERE "book_id" = (
SELECT "id"
FROM "books"
WHERE "title" = 'In Memory of Memory'
);
SELECT AVG("rating")
FROM "ratings"
WHERE "book_id" = (
SELECT "id"
FROM "books"
WHERE "title" = 'In Memory of Memory'
);
books, authors and authored.
SELECT "name"
FROM "authors"
WHERE "id" = (
SELECT "author_id"
FROM "authored"
WHERE "book_id" = (
SELECT "id"
FROM "books"
WHERE "title" = 'Flights'
)
);
The first query that is run is the most deeply nested one — finding the ID of the book Flights. Then, the ID of the author(s) who wrote Flights is found. Last, this is used to retrieve the author name(s).ININ keyword as follows.
SELECT "title"
FROM "books"
WHERE "id" IN (
SELECT "book_id"
FROM "authored"
WHERE "author_id" = (
SELECT "id"
FROM "authors"
WHERE "name" = 'Fernanda Melchor'
)
);
Note that the innermost query uses = and not the IN operator. This is because we expect to find just one author named Fernanda Melchor.What if the value of an inner query is not found?
Is it necessary to use four spaces to indent a subquery?
How can we implement a many-to-one relationship between tables?
authored table with multiple entries for the same book ID. Each of these entries would have a different author ID. It is worth noting that foreign key values can be repeated within a table, but primary key values are always unique.JOINJOIN works, consider a database of sea lions and their migration patterns. Here is a snapshot of the database.
sea lions and migrations together such that each sea lion also has its corresponding information as an extension of the same row.longlist.db using the .quit SQLite command. Then, open up sea_lions.db.SELECT *
FROM "sea_lions"
JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
Notice that:
- The ON keyword is used to specify which values match between the tables being joined. It is not possible to join tables without matching values.
- If there are any IDs in one table not present in the other, this row will not be present in the joined table. This kind of join is called an INNER JOIN.
LEFT JOIN, RIGHT JOIN and FULL JOIN. Each of these is a kind of OUTER JOIN.LEFT JOIN prioritizes the data in the left (or first) table.先outer join,然后在根据left或者right
SELECT *
FROM "sea_lions"
LEFT JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
This query would retain all sea lion data from the sea_lions table — the left one. Some rows in the joined table could be partially blank. This would happen if the right table didn’t have data for a particular ID.
RIGHT JOIN retains all the rows from the right (or second) table. A FULL JOIN allows us to see the entirety of all tables.OUTER JOIN could lead to empty or NULL values in the joined table.id. Since the value on which we are joining the tables has the same column name in both tables, we can actually omit the ON section of the query while joining.SELECT *
FROM "sea_lions"
NATURAL JOIN "migrations";
Notice that the result does not have a duplicate id column in this case. Also, this join works similarly to an INNER JOIN.
In the sea lions database, how are the IDs created? Do they come from the
sea_lionstable or themigrationstable?
If we are trying to join three tables, how can we know which the left or right tables are?
JOIN statement, the first table before the keyword is the left one. The one that is involved in the JOIN keyword is the right table.When we join tables, does the resulting joined table get saved? Can we reference it later without joining again?
JOIN, the result is a temporary table or a result set. It can be used for the duration of the query.There’s many different kinds of
JOIN. Is there a default one we should use?
JOIN — is actually an INNER JOIN and that’s the default for SQL.longlist.db.INTERSECT operator to find this set.
SELECT "name" FROM "translators"
INTERSECT
SELECT "name" FROM "authors";
SELECT "name" FROM "translators"
UNION
SELECT "name" FROM "authors";
Notice that every author and every translator is included in this result set, but only once!SELECT 'author' AS "profession", "name"
FROM "authors"
UNION
SELECT 'translator' AS "profession", "name"
FROM "translators";
EXCEPT keyword can be used to find such a set. In other words, the set of translators is subtracted from the set of authors to form this one.
SELECT "name" FROM "authors"
EXCEPT
SELECT "name" FROM "translators";
We can verify that no author-translator from the intersection set appears in this result set.EXCEPT.
SELECT "book_id" FROM "translated"
WHERE "translator_id" = (
SELECT "id" from "translators"
WHERE "name" = 'Sophie Hughes'
)
INTERSECT
SELECT "book_id" FROM "translated"
WHERE "translator_id" = (
SELECT "id" from "translators"
WHERE "name" = 'Margaret Jull Costa'
);
Each of the nested queries here finds the IDs of the books for one translator. The INTERSECT keyword is used to intersect the resulting sets and give us the books they have collaborated on.Could we use
INTERSECT,UNIONetc. to perform operations on 3-4 sets?
INTERSECT operator twice. An important note — we have to make sure to have the same number and same types of columns in the sets to be combined using INTERSECT, UNION etc.ratings table. For each book, we want to find the average rating of the book. To do this, we would first need to group ratings together by book and then average the ratings out for each book (each group).
SELECT "book_id", AVG("rating") AS "average rating"
FROM "ratings"
GROUP BY "book_id";
In this query, the GROUP BY keyword was used to create groups for each book and then collapse the ratings of the group into an average rating!SELECT "book_id", ROUND(AVG("rating"), 2) AS "average rating"
FROM "ratings"
GROUP BY "book_id"
HAVING "average rating" > 4.0;
Note that the HAVING keyword is used here to specify a condition for the groups, instead of WHERE (which can only be used to specify conditions for individual rows).Is it possible to see the number of ratings given to each book?
COUNT keyword.
SELECT "book_id", COUNT("rating")
FROM "ratings"
GROUP BY "book_id";
Is it also possible to sort the data obtained here?
SELECT "book_id", ROUND(AVG("rating"), 2) AS "average rating"
FROM "ratings"
GROUP BY "book_id"
HAVING "average rating" > 4.0
ORDER BY "average rating" DESC;